48 research outputs found

    Application of sound source separation methods to advanced spatial audio systems

    Full text link
    This thesis is related to the field of Sound Source Separation (SSS). It addresses the development and evaluation of these techniques for their application in the resynthesis of high-realism sound scenes by means of Wave Field Synthesis (WFS). Because the vast majority of audio recordings are preserved in twochannel stereo format, special up-converters are required to use advanced spatial audio reproduction formats, such as WFS. This is due to the fact that WFS needs the original source signals to be available, in order to accurately synthesize the acoustic field inside an extended listening area. Thus, an object-based mixing is required. Source separation problems in digital signal processing are those in which several signals have been mixed together and the objective is to find out what the original signals were. Therefore, SSS algorithms can be applied to existing two-channel mixtures to extract the different objects that compose the stereo scene. Unfortunately, most stereo mixtures are underdetermined, i.e., there are more sound sources than audio channels. This condition makes the SSS problem especially difficult and stronger assumptions have to be taken, often related to the sparsity of the sources under some signal transformation. This thesis is focused on the application of SSS techniques to the spatial sound reproduction field. As a result, its contributions can be categorized within these two areas. First, two underdetermined SSS methods are proposed to deal efficiently with the separation of stereo sound mixtures. These techniques are based on a multi-level thresholding segmentation approach, which enables to perform a fast and unsupervised separation of sound sources in the time-frequency domain. Although both techniques rely on the same clustering type, the features considered by each of them are related to different localization cues that enable to perform separation of either instantaneous or real mixtures.Additionally, two post-processing techniques aimed at improving the isolation of the separated sources are proposed. The performance achieved by several SSS methods in the resynthesis of WFS sound scenes is afterwards evaluated by means of listening tests, paying special attention to the change observed in the perceived spatial attributes. Although the estimated sources are distorted versions of the original ones, the masking effects involved in their spatial remixing make artifacts less perceptible, which improves the overall assessed quality. Finally, some novel developments related to the application of time-frequency processing to source localization and enhanced sound reproduction are presented.Cobos Serrano, M. (2009). Application of sound source separation methods to advanced spatial audio systems [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/8969Palanci

    Resynthesis of Acoustic Scenes Combining Sound Source Separation and WaveField Synthesis Techniques

    Full text link
    [ES] La Separacón de Fuentes ha sido un tema de intensa investigación en muchas aplicaciones de tratamiento de señaal, cubriendo desde el procesado de voz al análisis de im'agenes biomédicas. Aplicando estas técnicas a los sistemas de reproducci'on espacial de audio, se puede solucionar una limitaci ón importante en la resíntesis de escenas sonoras 3D: la necesidad de disponer de las se ñales individuales correspondientes a cada fuente. El sistema Wave-field Synthesis (WFS) puede sintetizar un campo acústico mediante arrays de altavoces, posicionando varias fuentes en el espacio. Sin embargo, conseguir las señales de cada fuente de forma independiente es normalmente un problema. En este trabajo se propone la utilización de distintas técnicas de separaci'on de fuentes sonoras para obtener distintas pistas a partir de grabaciones mono o estéreo. Varios métodos de separación han sido implementados y comprobados, siendo uno de ellos desarrollado por el autor. Aunque los algoritmos existentes están lejos de conseguir una alta calidad, se han realizado tests subjetivos que demuestran cómo no es necesario obtener una separación óptima para conseguir resultados aceptables en la reproducción de escenas 3D[EN] Source Separation has been a subject of intense research in many signal processing applications, ranging from speech processing to medical image analysis. Applied to spatial audio systems, it can be used to overcome one fundamental limitation in 3D scene resynthesis: the need of having the independent signals for each source available. Wave-field Synthesis is a spatial sound reproduction system that can synthesize an acoustic field by means of loudspeaker arrays and it is also capable of positioning several sources in space. However, the individual signals corresponding to these sources must be available and this is often a difficult problem. In this work, we propose to use Sound Source Separation techniques in order to obtain different tracks from stereo and mono mixtures. Some separation methods have been implemented and tested, having been one of them developed by the author. Although existing algorithms are far from getting hi-fi quality, subjective tests show how it is not necessary an optimum separation for getting acceptable results in 3D scene reproductionCobos Serrano, M. (2007). Resynthesis of Acoustic Scenes Combining Sound Source Separation and WaveField Synthesis Techniques. http://hdl.handle.net/10251/12515Archivo delegad

    Evaluación por compañeros de exposiciones orales

    Get PDF
    El proceso de exposición oral, además de ser una competencia instrumental importante dentro del Espacio Europeo de Educación Superior (EEES), es fundamental en el desarrollo del trabajo de un ingeniero, debido a que le permite transmitir a la audiencia sus conocimientos y trabajos de una forma efectiva. Para que el alumno desarrolle esta competencia, resulta habitual incluir actividades de exposición oral dentro de las asignaturas de los nuevos grados adaptados al EEES. El empleo de rúbricas para la evaluación de estas exposiciones, permite al alumno obtener una visión objetiva, clara y precisa de los criterios que se van a emplear en su valoración. Además, el empleo de estas rúbricas también facilita la posibilidad de que los propios alumnos califiquen el trabajo de sus compañeros, permitiendo que desarrollen capacidades cognitivas superiores como el pensamiento crítico y la capacidad de análisis. En este artículo, se presenta una experiencia orientada al desarrollo de este tipo de capacidades en los alumnos de nuevo ingreso. Para ello, los alumnos han realizado una evaluación por compañeros de exposiciones orales. Esta evaluación se ha llevado a cabo en la asignatura de Ingeniería, Sociedad y Universidad, impartida en los grados de Ingeniería Informática, Ingeniería Multimedia e Ingeniería Telemática. Además de la descripción de la experiencia, en este artículo también se incluye un estudio de la correlación entre las evaluaciones de la exposición oral realizadas por los alumnos respecto a las llevadas a cabo por los profesores.SUMMARY -- The oral presentation process, besides being an important instrumental competence within the European Higher Education Area (EHEA), is a major issue in the development of engineers’ work. In fact, the enhancement of their oral presentation skills allows them to transmit their knowledge to the audience effectively. In order to develop this competence, oral presentation activities are usually included within the program of subjects belonging to the new EHEA-adapted degrees. The use of rubrics for the assessment of these presentations allows students to obtain an objective, clear and accurate view of the criteria employed in the evaluation process. Moreover, the use of these items gives the students the possibility to rate their peers job, which also helps them to develop higher cognitive skills such as critical thinking or other analytical capabilities. In this paper we present an experience aimed at developing these capabilities in new students. To this end, the students themselves have been asked to assess their peers presentations. This assessment has been conducted within the ’Engineering, Society and University’ subject, which is taught in several degrees: Computer Science, Multimedia Engineering and Telematics Engineering. In addition to the description of this experience, the paper includes a statistical analysis of the obtained results, showing the correlation between the assessments corresponding to students and those of the teacher.Peer Reviewe

    Maximum a Posteriori Binary Mask Estimation for Underdetermined Source Separation Using Smoothed Posteriors

    Full text link
    Sound source separation has become a topic of intensive research in the last years. The research effort has been specially relevant for the underdetermined case, where a considerable number of sparse methods working in the time-frequency (T-F) domain have appeared. In this context, although binary masking seems to be a preferred choice for source demixing, the estimated masks differ substantially from the ideal ones. This paper proposes a maximum a posteriori (MAP) framework for binary mask estimation. To this end, class-conditional source probabilities according to the observed mixing parameters are modeled via ratios of dependent Cauchy distributions while source priors are iteratively calculated from the observed histograms. Moreover, spatially smoothed posteriors in the T-F domain are proposed to avoid noisy estimates, showing that the estimated masks are closer to the ideal ones in terms of objective performance measures.This work was supported by the Spanish Ministry of Science and Innovation under project TEC2009-14414-C03-01. The associate editor coordinating the review of this manuscript and approving it for publication was Prof. Jingdong Chen.Cobos Serrano, M.; López Monfort, JJ. (2012). Maximum a Posteriori Binary Mask Estimation for Underdetermined Source Separation Using Smoothed Posteriors. IEEE Transactions on Audio, Speech and Language Processing. 20(7):2059-2064. doi:10.1109/TASL.2012.2195654S2059206420

    An Efficient Implementation of Parallel Parametric HRTF Models for Binaural Sound Synthesis in Mobile Multimedia

    Get PDF
    The extended use of mobile multimedia devices in applications like gaming, 3D video and audio reproduction, immersive teleconferencing, or virtual and augmented reality, is demanding efficient algorithms and methodologies. All these applications require real-time spatial audio engines with the capability of dealing with intensive signal processing operations while facing a number of constraints related to computational cost, latency and energy consumption. Most mobile multimedia devices include a Graphics Processing Unit (GPU) that is primarily used to accelerate video processing tasks, providing high computational capabilities due to its inherent parallel architecture. This paper describes a scalable parallel implementation of a real-time binaural audio engine for GPU-equipped mobile devices. The engine is based on a set of head-related transfer functions (HRTFs) modelled with a parametric parallel structure, allowing efficient synthesis and interpolation while reducing the size required for HRTF data storage. Several strategies to optimize the GPU implementation are evaluated over a well-known kind of processor present in a wide range of mobile devices. In this context, we analyze both the energy consumption and real-time capabilities of the system by exploring different GPU and CPU configuration alternatives. Moreover, the implementation has been conducted using the OpenCL framework, guarantying the portability of the code

    Computer-based detection and classification of flaws in citrus fruits

    Full text link
    [EN] In this paper, a system for quality control in citrus fruits is presented. In current citrus manufacturing industries, calliper and color are successfully used for the automatic classification of fruits using vision systems. However, the detection of flaws in the citrus surface is carried out by means of human inspection. In this work, a computer vision system capable of detecting defects in the citrus peel and also classifying the type of flaw is presented. First, a review of citrus illnesses has been carried out in order to build a database of digitalized oranges classified by the kind of fault, which is used as a training set. The segmentation of faulty zones is performed by applying the Sobel gradient to the image. Afterwards, color and texture features of the flaw are extracted considering different color spaces, some of them related to high order statistics. Several techniques have been employed for classification purposes: Euler distance to a prototype, to the nearest neighbor and k-nearest neighbors. Additionally, a three layer neural network has been tested and compared, obtaining promising results.López Monfort, JJ.; Cobos Serrano, M.; Aguilera Martí, E. (2011). Computer-based detection and classification of flaws in citrus fruits. Neural Computing and Applications. 20(7):975-981. doi:10.1007/s00521-010-0396-2S975981207Blasco J, Aleixos J, Molto E (2007) Computer vision detection of peel defects in citrus by means of a region oriented segmentation. J Food Eng 81:535–543Blasco J, Aleixos N, Gomez J, Molto E (2007) Citrus sorting by identification of the most common defects using multispectral computer vision. J Food Eng 83:384–391Bryson AE, Ho YC (1969) Applied optimal control: optimization, estimation, and control. Xerox College Publishing, Lexington, MAConners RWea (1983) Identifying and locating surface defects in wood. IEEE Trans Pattern Anal Mach Intell 5:573–583Diaz R, Gil L, Serrano C, Blasco M, Molto E, Blasco J (2004) Comparison of three algorithms in the classification of table olives by means of computer vision. J Food Eng 61:101–107Douglas DH, Peucker TK (1973) Algorithm for the reduction of the number of points required to represent a line or its caricature. The Can Cartogr 10(2):112–122Du CJ, Sun DW (2005) Comparison of three methods for classification of pizza topping using different colour space transformations. J Food Eng 68:277–287Kolesnikov A (2003) Efficient algorithms for vectorization and polygonal approximation. Ph.D. thesis, University of Joensuu, FinlandMolto E (1997) A computer vision system for inspecting citrus, peaches and apples. In: Proceedings of VII national symposium on pattern recognition and image analysis. Sabadell, Spain, pp 121–126Muir AY, Porteus RL, Wastie RL (1982) Experiments in the detection of incipient diseases in potato tubers by optical methods. J Agric Eng Res 27:131–138Q Li (2002) Computer vision based system for apple surface defect detection. computer and electronics in agriculture. Comput Electron Agric 36:215–223Ruiz LA, Molto E, Juste F, Pla F, Valiente R (1996) Location and characterization of the stem–calyx area on oranges by computer vision. J Agric Eng Res 64:165–172Tan TSC, Kittler J (1994) Colour texture analysis using colour histogram. IEEE Proc Vis Image Signal Process 141:403–412Wen Z, Tao Y (1999) Building a rule-based machine-vision system for defect inspection on apple sorting and packing lines. Expert Syst Appl 16:307–31

    Evaluación por Compañeros de Exposiciones Orales

    Get PDF
    El proceso de exposición oral, además de ser una competencia instrumental importante dentro del Espacio Europeo de Educación Superior (EEES), es fundamental en el desarrollo del trabajo de un ingeniero, debido a que le permite transmitir a la audiencia sus conocimientos y trabajos de una forma efectiva. Para que el alumno desarrolle esta competencia, resulta habitual incluir actividades de exposición oral dentro de las asignaturas de los nuevos grados adaptados al EEES. El empleo de rúbricas para la evaluación de estas exposiciones, permite al alumno obtener una visión objetiva, clara y precisa de los criterios que se van a emplear en su valoración. Además, el empleo de estas rúbricas también facilita la posibilidad de que los propios alumnos califiquen el trabajo de sus compañeros, permitiendo que desarrollen capacidades cognitivas superiores como el pensamiento crítico y la capacidad de análisis. En este artículo, se presenta una experiencia orientada al desarrollo de este tipo de capacidades en los alumnos de nuevo ingreso. Para ello, los alumnos han realizado una evaluación por compañeros de exposiciones orales. Esta evaluación se ha llevado a cabo en la asignatura de Ingeniería, Sociedad y Universidad, impartida en los grados de Ingeniería Informática, Ingeniería Multimedia e Ingeniería Telemática. Además de la descripción de la experiencia, en este artículo también se incluye un estudio de la correlación entre las evaluaciones de la exposición oral realizadas por los alumnos respecto a las llevadas a cabo por los profesores.The oral presentation process, besides being an important instrumental competence within the European Higher Education Area (EHEA), is a major issue in the development of engineers’ work. In fact, the enhancement of their oral presentation skills allows them to transmit their knowledge to the audience effectively. In order to develop this competence, oral presentation activities are usually included within the program of subjects belonging to the new EHEA-adapted degrees. The use of rubrics for the assessment of these presentations allows students to obtain an objective, clear and accurate view of the criteria employed in the evaluation process. Moreover, the use of these items gives the students the possibility to rate their peers job, which also helps them to develop higher cognitive skills such as critical thinking or other analytical capabilities. In this paper we present an experience aimed at developing these capabilities in new students. To this end, the students themselves have been asked to assess their peers presentations. This assessment has been conducted within the ’Engineering, Society and University’ subject, which is taught in several degrees: Computer Science, Multimedia Engineering and Telematics Engineering. In addition to the description of this experience, the paper includes a statistical analysis of the obtained results, showing the correlation between the assessments corresponding to students and those of the teacher.Este trabajo ha sido financiado por el Vicerrectorado de Cultura, Igualdad y Planificación de la Universidad de Valencia, dentro del proyecto de innovación educativa con número de expediente 118/FO11/49

    Fast channel estimation in the transformed spatial domain for analog millimeter wave systems

    Get PDF
    Fast channel estimation in millimeter-wave (mmWave) systems is a fundamental enabler of high-gain beamforming, which boosts coverage and capacity. The channel estimation stage typically involves an initial beam training process where a subset of the possible beam directions at the transmitter and receiver is scanned along a predefined codebook. Unfortunately, the high number of transmit and receive antennas deployed in mmWave systems increase the complexity of the beam selection and channel estimation tasks. In this work, we tackle the channel estimation problem in analog systems from a different perspective than used by previous works. In particular, we propose to move the channel estimation problem from the angular domain into the transformed spatial domain, in which estimating the angles of arrivals and departures corresponds to estimating the angular frequencies of paths constituting the mmWave channel. The proposed approach, referred to as transformed spatial domain channel estimation (TSDCE) algorithm, exhibits robustness to additive white Gaussian noise by combining low-rank approximations and sample autocorrelation functions for each path in the transformed spatial domain. Numerical results evaluate the mean square error of the channel estimation and the direction of arrival estimation capability. TSDCE significantly reduces the first, while exhibiting a remarkably low computational complexity compared with well-known benchmarking schemes

    Practical considerations for acoustic source localization in the IoT era: Platforms, energy efficiency, and performance

    Get PDF
    The rapid development of the Internet of Things (IoT) has posed important changes in the way emerging acoustic signal processing applications are conceived. While traditional acoustic processing applications have been developed taking into account high-throughput computing platforms equipped with expensive multichannel audio interfaces, the IoT paradigm is demanding the use of more flexible and energy-efficient systems. In this context, algorithms for source localization and ranging in wireless acoustic sensor networks can be considered an enabling technology for many IoT-based environments, including security, industrial, and health-care applications. This paper is aimed at evaluating important aspects dealing with the practical deployment of IoT systems for acoustic source localization. Recent systems-on-chip composed of low-power multicore processors, combined with a small graphics accelerator (or GPU), yield a notable increment of the computational capacity needed in intensive signal processing algorithms while partially retaining the appealing low power consumption of embedded systems. Different algorithms and implementations over several state-of-the-art platforms are discussed, analyzing important aspects, such as the tradeoffs between performance, energy efficiency, and exploitation of parallelism by taking into account real-time constraintsThis work was supported in part by the Post-Doctoral Fellowship from Generalitat Valenciana under Grant APOSTD/2016/069, in part by the Spanish Government under Grant TIN2014-53495-R, Grant TIN2015-65277-R, and Grant BIA2016-76957-C3-1-R, and in part by the Universidad Jaume I under Project UJI-B2016-20.Publicad

    A Parallel Approach to HRTF Approximation and Interpolation Based on a Parametric Filter Model

    Get PDF
    "© 2017 IEEE. Personal use of this material is permitted. Permissíon from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertisíng or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works."[EN] Spatial audio-rendering techniques using head-related transfer functions (HRTFs) are currently used in many different contexts such as immersive teleconferencing systems, gaming, or 3-D audio reproduction. Since all these applications usually involve real-time constraints, efficient processing structures for HRTF modeling and interpolation are necessary for providing real-time binaural audio solutions. This letter presents a parametric parallel model that allows us to perform HRTF filtering and interpolation efficiently from an input HRTF dataset. The resulting model, which is an adaptation from a recently proposed modeling technique, not only reduces the size of HRTF datasets significantly, but also allows for simplified interpolation and real-time computation over parallel processors. In order to discuss the suitability of this new model, an implementation over a graphic processing unit is presented.This work was supported by the Spanish Ministry of Economy and Competitiveness under Grant TEC2012-37945-C02-02 and FEDER funds and by the UNKP-16-4-III New National Excellence Program of the Hungarian Ministry of Human Capacities. The work of J. A. Belloch was supported by GVA Postdoctoral Contract APOSTD/2016/069.Ramos Peinado, G.; Cobos Serrano, M.; Bank, B.; Belloch Rodríguez, JA. (2017). A Parallel Approach to HRTF Approximation and Interpolation Based on a Parametric Filter Model. IEEE Signal Processing Letters. 24(10):1507-1511. https://doi.org/10.1109/LSP.2017.2741724S15071511241
    corecore